Multi-label Patent Classification at NTT Communication Science Laboratories
نویسندگان
چکیده
We design a multi-label classification system based on the combination of binary classifications for classification subtask at NTCIR-6 Patent Retrieval Task. In our system, we design a binary classifier per Fterm that determines the assignment of the F-term to patent documents. Hybrid classifiers are employed as binary classifiers so that the multiple components of patent documents are used effectively. The hybrid classifiers are constructed by combining component generative models with weights based on the maximum entropy principle. Using a test collection of Japanese patent documents, we confirmed that our system provided good ranking of F-terms as regards assigning them to patent documents.
منابع مشابه
Selected Papers: Human Information Science
NTT Communication Science Laboratories conducts scientific research aimed at understanding human information-processing mechanisms and technological research aimed at realizing humanfriendly interfaces in computer environments. The ultimate goal of our work is to enrich communications among people as well as between people and computers. This special feature presents the leading edge of our res...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملMLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...
متن کاملTowards the Real-World Semantic Web— Web Search based on Spatial and Temporal Metadata
To enrich human communication in ubiquitous environments, NTT Communication Science Laboratories have been conducting research on the Real-World Semantic Web. When you deal with real-world information in ubiquitous environments, spatial information and temporal information play important roles. However, it is difficult to deal with spatial or temporal search conditions in current Web search eng...
متن کاملMulti-label Classification using Logistic Regression Models for NTCIR-7 Patent Mining Task
We design a multi-label classification system based on a machine learning approach for the NTCIR-7 Patent Mining Task. In our system, we employ a logistic regression model for each International Patent Classification (IPC) code that determines the IPC code assignment of research papers. The logistic regression models are trained by using patent documents provided by task organizers. To mitigate...
متن کامل